VAST draft

"UBA-PICCOLI-MC1"

VAST 2012 Challenge

Mini-Challenge 1: Bank of Money Enterprise: Cyber Situation Awareness

Team Members:

Eloisa Piccoli, Buenos Aires University, elopiccoli@gmail.com PRIMARY

Guido Sidoni, Buenos Aires University, guidolo@gmail.com

Santiago Banchero, Buenos Aires University, santiagobanchero@gmail.com

Student Team: YES

Tool(s):

PostgreSQL

SQLServer

SQL Analysis Services

Tableau

Video:

VAST 2012 Challenge

MC 1.1 Create a visualization of the health and policy status of the entire Bank of Money enterprise as of 2 pm BMT (BankWorld Mean Time) on February 2. What areas of concern do you observe?

Given the large amount of entries in the dataset, we had to sacrifice detailed information for the sake of obtaining an entire picture of the Bank’s health status. We chose a treemap (Fig. 1) at business unit level, colored by policy status. This graph enables to get a general idea as well as identifying areas of concern.

As first assessment, we set the color coding to show the highest category the business unit reaches. The size of the rectangles is set to be proportional to the logarithm of the amount of online machines the region presents.

There are no business units whose highest policy is either 1 or 2. All of them have at least one
machine that presents policy higher than 2. Most of them are policy 4 (notice that large BUs are all dark orange). The other big group shows policy 3 as the highest, and finally there is only one region that presents policy status 5, which is the most risky situation. We considered the region Headquarters as our focal point to continue investigating. We explored the area and found that the machine under policy 5 is located in Datacenter 2. Considering a datacenter purpose, we would like to highlight that the risk associated with having a machine at policy 5 in these type of unit is higher than having it at a branch affiliate. In a second treemap (Fig. 2) we set the color coding to show predominant policy status. Regions 5 and 10 are the only ones that do not present policy 1 as the main value, all other regions have more than 80% of the machines under this healthy policy level.

Fig. 1: Business unit and highest policy

Fig. 2: Business unit and predominant policy

MC 1.2 Use your visualization tools to look at how the network’s status changes over time. Highlight up to five potential anomalies in the network and provide a visualization of each. When did each anomaly begin and end? What might be an explanation of each anomaly?

For this exercise we worked with both, UTC time stamp and local time. We ended up identifying the following anomalies:

1. During the timeframe reported we identified a strong tendency of going up in policy levels. Computers under non-risky situation (Policies 1 & 2) in early stages ended up showing either critical policy deviations or possible virus infection (Policies 4 & 5).

Fig. 3.A represents computer count under policy levels 4 and 5 along three days. Amount is clearly rising for both policies, however, policy 4 increases more and much faster than policy 5.

Fig. 3.B shows the evolution among policy levels for those computers who have reached policy 5 at least once during the analyzed period. There is a clear tendency of going from healthy policies to more risky ones. There are a very few cases which recovered and went back to normal status (showed by the little grey circles inside red ones during the latest hours), the majority of them do not recover.

Fig. 3.A: Policy 4 & 5 growing curve. Fig. 3.B: Policy evolution of those machines who reached policy 5

2. Region 25 presents abnormal behavior from 02/02 09:15AM until 02/03 00:00AM local time (UTC time: 02/02 12:15PM until 02/03 03:15AM) . There are many machines which are not reported during this timeframe, causing a decrease in number of connections. Fig. 4.A clearly shows Region 25 (red line) does not follow same pattern as all other small regions, which are aligned one another (grey scale). The amount of online computers for Region 25 considerably decreased during mentioned time period.

Fig. 4.B represents how long the facilities of region 25 are online during 02/02 expressed in local hours. Only a few buildings, such as Branch 11, Branch 21, Branch 22, Branch 31, etc. maintained an uninterrupted connectivity. All the ones that do not present a solid line along the timeframe went down for a few hours unexpectedly. We said unexpectedly based on the following analysis. Since Activity 2 informs whether a computer will be down for maintenance shortly, we assessed if Region 25 presented an increase in this activity level a while before the anomaly occurred. We present this research in Fig. 4.C, where all regions, including 25, had a ^[a]similar behaviour related to activity 2. The decrease in connectivity is undoubtedly caused by a odd reason as it was not planned by the organization.

Fig. 4: Multigraph to show different aspects of the odd behaviour at Region 25

3. Region 10 shows a peak of connections during non-working hours ^[b]on February 3rd. This anomaly starts at 2:15AM ends at 5:15AM the same day (UTC: 8:15AM ends at 11:15AM) and it is caused by tellers’ connections. In Fig. 5.A we show the amount of connections performed by these type of machines along the day. The abnormal peak is clearly represented by the red line (region 10).

Fig.B shows the amount of online tellers does not match the peak in Fig. 5.A, meaning this is not caused by more machines turned on but due to an increase in number of connections of the original reported tellers.

This is explained in Fig. 5.C, where only Region 10 presents connections higher than 50.

Investigating further we included Activity level in the analysis (Fig. 5.D). At a normal situation, tellers report Activity levels 3, 4 and 5 during primetime, as the bar chart indicates. If the connection peak we observed was due to tellers starting to work earlier, the reported activity should be aligned with the presented at normal working hours (7AM-18PM) however, these activity levels are not showed within peak frame.

Fig. 5.A: Line graph to show number of connections of tellers as of 02/03 by local hours . 5.B: Line graph to show number of online tellers as of 02/03 by local hours. 5.C: Bar chart counting number of connections greater than 50 by hours. 5.D: Bar chart to show tellers’ count of activity 3, 4 and 5 performed by hours on 02/03

4. Datacenter 5 presents different behavior on 02/02 regarding the time servers start to be reported. Fig. 6 shows this anomaly starting at 8.15AM until 19.00PM UTC. Until 12.30 the amount of reported servers does not overcome 243 for each snapshot in comparison with 49.000 of the other datacenters . We analyzed if this was related to the type of machine function and we found out that the only one that shows a normal behavior is Office.

Fig. 6: Line chart to compare the time when machines were turned on